Goto

Collaborating Authors

 global convergence



On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms Lam M. Nguyen

Neural Information Processing Systems

Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which matches the mainstream practical heuristics. We show the convergence to a global solution of shuffling SGD for a class of non-convex functions under over-parameterized settings.


On the Convergence of Encoder-only Shallow Transformers

Neural Information Processing Systems

Besides, neural tangent kernel (NTK) based analysis is also given, which facilitates a comprehensive comparison. Our theory demonstrates the separation on the importance of different scaling schemes and initialization.